Structure Annotation in the Polish Corpus of Suicide Notes
نویسندگان
چکیده
Polish Corpus of Suicide Notes (henceforth PCSN) is constructed to meet the needs of forensic linguistics. Suicide notes are messages created in borderline situation, shortly before death. Hence the annotation schema requires a complex description of a document structure, the textual content, as well as its linguistic properties. TEI was selected as the basis for the document encoding schema. TEI adaptation and extension with respect to such aspects of encoding as: a letter structure, various layers of changes and omissions, error correction, and extra-linguistic elements etc., are discussed with examples.
منابع مشابه
Recognition of Genuine Polish Suicide Notes
In this article we present the result of the research on the recognition of genuine Polish suicide notes (SNs). We provide useful method to distinguish between SNs and other types of discourse, including counterfeited SNs. The method uses a wide range of word-based and semantic features and it was evaluated using Polish Corpus of Suicide Notes, which contains 1244 genuine SNs, expanded with a m...
متن کاملLexicons and Grammars for Named Entity Annotation in the National Corpus of Polish
We present initial results in the named entity annotation subtask of a project aiming at creating the National Corpus of Polish. We summarize the annotation requirements de ned for this corpus, and we discuss how existing lexical resources and grammars for Polish named entities have been adapted to meet those requirements. We show rst results of the corpus annotation using the information extra...
متن کاملThe Design of Syntactic Annotation Levels in the National Corpus of Polish
This paper presents the procedure of the syntactic annotation of the National Corpus of Polish. Syntactic annotation consists here of shallow parsing and manual post-editing of the results by annotators. The description concentrates on the delimitation of syntactic words and groups, as well as on problems encountered during the annotation process.
متن کاملThe design of Polish Speech Corpus for Unit Selection Speech Synthesis
The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Sectio...
متن کاملAutomatic Detection of Annotation Errors in Polish-Language Corpora
In this article we propose an extension to the variation ngram based method of detecting annotation errors. We also show an approach to nding anomalies in the morphosyntactic annotation layer by using association rule discovery. As no research has previously been done in the eld of morphosyntactic annotation error correction for Polish, we provide novel results based on experiments on the large...
متن کامل